71 research outputs found

    Intelligent Testing Can Be Very Lazy

    Get PDF
    Abstract Testing is a search process and a test suite is complete when the search has examined all the comers of the program. Standard models of test suite sizes are gross over-estimates since they are unaware of the nature of that search space. For example, only a small part of the possible search space is ever exercised in practice. Further, a repeated result is that a few random searches often yields as much information as more thorough search strategies. Hence, only a few tests are needed to sample the range of behaviours of a program

    Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

    Full text link
    AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations

    Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

    Full text link
    In this work, we present a method to add perturbations to the code descriptions, i.e., new inputs in natural language (NL) from well-intentioned developers, in the context of security-oriented code, and analyze how and to what extent perturbations affect the performance of AI offensive code generators. Our experiments show that the performance of the code generators is highly affected by perturbations in the NL descriptions. To enhance the robustness of the code generators, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions

    Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

    Get PDF
    Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions.Comment: Paper accepted for publication in the proceedings of The 1st Intl. Workshop on Natural Language-based Software Engineering (NLBSE) to be held with ICSE 202

    ADEQUACY OF LIMITED TESTING FOR KNOWLEDGE BASED SYSTEMS

    Full text link

    An Algorithm for Identifying Novel Targets of Transcription Factor Families: Application to Hypoxia-inducible Factor 1 Targets

    Get PDF
    Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets

    Accelerated Testing for Software Reliability Assessment

    No full text
    It is necessary to assess the reliability of safety-critical systems to a high degree of confidence before they can be used in the field. One theoretically sound approach for assessing software reliability is to use statistical sampling according to the operational profile. However, this method requires a prohibitive amount of testing. To make this approach more feasible, it is necessary to reduce the time needed to execute a test case. In this paper, we develop transformation methods that speed up the execution of each test case so that a large number of test cases can be run in a shorter time. 1 Introduction Software reliability is becoming a dominant concern in software development, replacing traditional issues such as cost and schedule overruns. This particularly holds for the development of safety-critical process-control systems for aircrafts and space vehicles, emergency shutdown systems in nuclear power plants, military applications, etc. Any failure of these systems can resu..

    Bridging the Gaps of Parallel Programming

    No full text
    Two basic technology gaps in today's parallel computers are: 1) too much latency in accessing overall system memory, and 2) too little compiler power for existing languages. Consequently, the usability of massively parallel machines is limited. In this article we describe current parallel programming practice, and discuss the problems that need to be solved in order to make parallel programming machine independent. 1 Introduction High performance computing has been a key technology for many scientific and engineering disciplines during the past several decades. Parallel computing has evolved remarkably during the same time, but its utility is still lacking in many respects in comparison with sequential computing. Due to likely stagnation in the increase of clock speeds for microprocessors, parallelism will become a necessity for building ever faster computers in the future. In the mid 90's, sequential architectures and compilers reached maturity and became the commodity on the market...
    • …
    corecore